Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Text punctuation restoration for Vietnamese speech recognition with multimodal features
Hua LAI, Tong SUN, Wenjun WANG, Zhengtao YU, Shengxiang GAO, Ling DONG
Journal of Computer Applications    2024, 44 (2): 418-423.   DOI: 10.11772/j.issn.1001-9081.2023020231
Abstract108)   HTML10)    PDF (3010KB)(47)       Save

The text sequence output by the Vietnamese speech recognition system lacks punctuation, and punctuating the recognized text can help eliminate ambiguity and make it easier to understand. However, the punctuation restoration model based on text modality faces the problem of inaccurate punctuation prediction when dealing with noisy text, as errors in phonemes often occur in Vietnamese speech recognition systems, which can destroy the semantics of the text. A Vietnamese speech recognition text punctuation restoration method that utilizes multi-modal features was proposed, guided by intonation pauses and tone changes in Vietnamese speech to correctly predict punctuation for noisy text. Specifically, Mel-Frequency Cepstral Coefficients (MFCC) were used to extract speech features, pre-trained language models were used to extract text context features, and speech and text features were fused with label attention mechanism to fuse multi-modal features, thereby enhancing the model’s ability to learn contextual information from noisy Vietnamese text. Experimental results show that compared to punctuation restoration models that extract only text features based on Transformer and BERT (Bidirectional Encoder Representations from Transformers), the proposed method improves the precision, recall, and F1 score on Vietnamese dataset by at least 10 percent points, demonstrating the effectiveness of fusing speech and text features in improving punctuation prediction accuracy for noisy Vietnamese speech recognition text.

Table and Figures | Reference | Related Articles | Metrics
Neural machine translation method based on source language syntax enhanced decoding
Longchao GONG, Junjun GUO, Zhengtao YU
Journal of Computer Applications    2022, 42 (11): 3386-3394.   DOI: 10.11772/j.issn.1001-9081.2021111963
Abstract306)   HTML7)    PDF (1267KB)(146)       Save

Transformer, one of the best existing machine translation models, is based on the standard end?to?end structure and only relies on pairs of parallel sentences, which is believed to be able to learn knowledge in the corpus automatically. However, this modeling method lacks explicit guidance and cannot effectively mine deep language knowledge, especially in the low?resource environment with limited corpus size and quality, where the sentence encoding has no prior knowledge constraints, leading to the decline of translation quality. In order to alleviate the issues above, a neural machine translation model based on source language syntax enhanced decoding was proposed to explicitly use the source language syntax to guide the encoding, namely SSED (Source language Syntax Enhanced Decoding). A syntax?aware mask mechanism based on the syntactic information of the source sentence was constructed at first, and an additional syntax?dependent representation was generated by guiding the encoding self?attention. Then the syntax?dependent representation was used as a supplement to the representation of the original sentence and the decoding process was integrated by attention mechanism, which jointly guided the generation of the target language, realizing the enhancement of the prior syntax. Experimental results on several standard IWSLT (International Conference on Spoken Language Translation) and WMT (Conference on Machine Translation) machine translation evaluation task test sets show that compared with the baseline model Transformer, the proposed method obtains a BLEU score improvement of 0.84 to 3.41 respectively, achieving the state?of?the?art results of the syntactic related research. The fusion of syntactic information and self?attention mechanism is effective, the use of source language syntax can guide the decoding process of the neural machine translation system and significantly improve the quality of translation.

Table and Figures | Reference | Related Articles | Metrics
Event detection without trigger words incorporating syntactic information
Cui WANG, Yafei ZHANG, Junjun GUO, Shengxiang GAO, Zhengtao YU
Journal of Computer Applications    2021, 41 (12): 3534-3539.   DOI: 10.11772/j.issn.1001-9081.2021060928
Abstract246)   HTML6)    PDF (697KB)(101)       Save

Event Detection (ED) is one of the most important tasks in the field of information extraction, aiming to identify instances of specific event types in text. Existing ED methods usually use adjacency matrix to express syntactic dependencies, however, the adjacency matrix often needs to be encoded with Graph Convolutional Network (GCN) to obtain syntactic information, which increases the complexity of the model. Therefore, an event detection method without trigger words incorporating syntactic information was proposed. After converting the dependent parent word and its context into a position marker vector, the word embedding of dependent sub-word was incorporated at the source end of the model in a parameter-free manner to strengthen the semantic representation of the context, without the need of GCN for encoding. In addition, for the time-consuming and laborious labeling of trigger words, a type perceptron based on the multi-head attention mechanism was designed, which was able to model the potential trigger words in the sentence to complete the event detection without trigger words. In order to verify the performance of the proposed method, experiments were conducted on the ACE2005 dataset and the low-resource Vietnamese dataset. Compared with the Event Detection Using Graph Transformer Network (GTN-ED) method, the F1-score of the proposed method was increased by 3.7% on the ACE2005 dataset; compared with the binary classification method Type-aware Bias Neural Network with Attention Mechanisms (TBNNAM), the F1-score of the proposed method was increased by 9% on the Vietnamese dataset. The results show that the integration of syntactic information into Transformer can effectively connect the scattered event information in the sentence to improve the accuracy of event detection.

Table and Figures | Reference | Related Articles | Metrics